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Abstract • 

Seneral theorems concerning the stronn 
Ponentiel mixture perimeters ere proved. 0 " S ^ T *C* V of 
consistency of the HLE of normal *** tte<>renS ,Bp,y th * strong 

Wniaed into -fields” each r ' " r6 '*'**'** «* <*te is or- 

~ -« disti::; f h ,s a — — - - - - * 



1. Introduction s 

U ' . \ - r * A* 

In C 5 ] a statistical model for LANDSAT agricultural data based on normal 

t . 

mixtures was Introduced which admits a specific kind of dependence among the 

r'S - .“v J 

observations, namely their association into fields, each representing a single 
agricultural class.. Necessary conditions were derived for a maximum likeli- 
hood estimate of the parameters of t’te model and a numerical procedure for 
solution of the likelihood equations was suggested. The question of the 
consistency of the maximum likelihood estimate. Is complicated by the fact 
that it is no longer possible to reduce the sample to a set of Independent 
Identically distributed variables. The purpose of this note is to establish 
a general theorem on the existence of a consistent maximum likelihood estimate 
when the observations are not identically distributed and to show Its applica- 
bility to the statistical model described in detail below. 


We assume that each pixel is Identified by a pair (j,k) of positive 
integers, where the first index j, 1 s j s p, identifies the field containing 
the pixel and the second index k, 1 s k s Nj, distinguishes it from other 
pixels in the same field. We suppose that the field structure is predetermined, 
perhaps as part of a spatial clustering algorithm such as AMOEBA. Let 

be the random vector of spectral measurements from pixel (j»k) and 
let Gji' e {l,’",m} be an unobserved random variable indicating its class 
index. We assume that the class indices 0^^, Gjg* *“> ©jjy from the jth 

j 

field are all the same and denote their common value by 0 y We further 
assume that, conditioned on ©., • £, the measurements ***, * 4N are 




\JN 


j 


2 


Independently distributed as N n K y®, z°), the n-varlate normal with 
unknown mean pj and unknown covariance Z° . Let Xj = (jr^, •**, . 

Our final assumptions are that (x y 0^, *\ (jf p , e p ) are Independent 
and that (0j> are Identically distributed with unknown a° * ProbC0=tl > 0 . 
Under these assumptions, the joint density of all the observations is 


( 1 ) 


P(x 


1* 


V 


p m 

n z 

j=i t*l 


N j 

a? II 
* k=l 


W n^ x jk* p £* 


where Xj = (x^j, **, x^j) < /? n ^ J . This joint density is parametrized by 

E fc)U*l» “*» ml where > 0 ; Z a £ * 1; p £ e j? n ; and Z £ is 

a real nxn positive definite symmetric matrix. For convenience, we let 
^ = *”» denote an arbitrary member of the parameter 

space and \|>° the true value of the parameter. Thus the likelihood function 
corresponding to the sample /fj, •**, * p is 


(2) L(if/; * 


r 


V 


p 

JI 


m 

Z 


j=l 1=1 


n j 

ir 

k*l 




v E *> 


for xj = (xjj, x jNj ) c ff nNj let 


1 J 

"j ’ *jk 


and 


■ W ’ k ;, (x jk - m j )(x jk - m j )T 
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bt the mean and scatter matrix respectively of the 


vectors x^,, ***, 


jl* * *jNj ‘ 


(3) 


n i 

nut „ nNj 

k !!i ff n (x jk 1 v V ■ (air qj(Xji » t . E t > 


where 


(4) 


V x j ; H* %) = |l £ r -V* exp {- 

N j (m j - hM*S " • 


*s tr E^rSj + 


Let 


(5) 


q j (x j | * ) 1 JjWv v v E *> 


By ignoring terms which are independent of the parameters we derive the log 
1 1 kellhood function 


( 6 ) 


= E log qj(x-IV') 
j=l J J 


which leads to the following necessary conditions for a local maximum of the 

likelihood function. Equations (7) - (9) are called the likelihood equations 
for the present model. 


( 7 ) 


c = _L £ Vj (Jf r V l j) 

p q; (*; | \p) 
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(8) 


P N.q Ax.i y , I ) 

U = I - J - j j — - — L- 
j-1 


m. 


(9) E, 


q j(*jW 


, £ q j^j ; V T ‘t) 

j=l 


^ A 


P Njqjdjs y V 

j* 1 qjUjk) 


N jV*j ; V z j^ 


qjU;|<M 


P N.q.U-t m 0 > E 0 ) 
+ I 

j-l 


H jL* “V , w lT / P N -q - (AT - ; u«» E e ) 

:L3 ~ i — / E - J ... J - J- 4 * 


qj( J ji*> 


J=1 qjdjk) 


2. The General Theorem 


Let 0 be an open subset of u and let \p° <■ 0 . Suppose XyX^,'", 
is a sequence of independent* random vectors with x r having N r -variate density 
function q r (*|^°) with respect to some fixed ^-finite measure \ r on R* r . 
Suppose the densitites q r ( * ! «P) are defined for each ip e Q . Given a 

positive integer p , define a maximum likelihood estimate of \p° to be an 

p 

element ip e 0 which locally maximizes L_(t|;) = E log q . The equation 

P r=l r r 

D^Lp(^) =0 will be called the likelihood equation , where the symbol 
denotes the Frechet derivative with respect to ip 


A number of theorems dealing with the consistency of maximum likelihood 
estimates, under the additional assumption that the A^'s are identically 
distributed, have been presented in the literature (see for instance Chanda [?]. 
Cramer [ u j, and Wald t Hi.) Extending any of these results to the case of 
nonidentically distributed observations is primarily a matter of finding a 
convenient set of conditions which insures that a law of large numbers can be 
invoked at several points in the proofs. The following theorem is such an 
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outgrowth of the proof of strong consistency contained in Ifi ]. 


.Theorem 1: Suppose there Is a neighborhood 8 of y 0 and a X 
» r 1iifl Nr such that for all t t 9; «(»,, I.J.k • r 

3 3 log q r (x|i|i) 
31^3^3^ 


positive Integers) 
satisfy: 

(D 


aq r (x|*) 3 Z qJx|*) 




3 ^ Bi/jj 


-i and 


3q r (x|i|j) 


**4 


s f ir (x > 


( 11 ) 


3 ,q r (x|*) 


3 ^j 3>i>j 


5 f 1jr (x > 


(111) 


3 q r (xj^) 




< f ijkr vX ^ 


N 


where f^ r and f.^ are A r -integrable on r r and 

ECf ijkr^r^ 3 = f f ijkr^ x ^M x ^o^ dX r^ x ^ s M 

tf Nr 

for all r € , where M is a constant. Suppose also that 


(v) 


Slog q r (* r |*°) 


and 


(vi) E 


‘II 

^ j 

1 

< 

1 

( 3 2 q r (Jf r !\i>°) 

\ 2 \ 

1 q r (* r ^ 0 ) z V at^ ) | 


5 M 


~ null sets 
€ (the 

— exist and 



€ 


for all 1,j a l,***,2 and r e . Finally suppose that 3 € >0 such that 
(vii) J r (U>°) = ECV^log q r (Ar f ) ip°) V^log q r (^ r j^°) T l > f 1^ 

for all r r , where the ordering is the usual one on vxv syronetric matrices. 
Then, it is almost surely true that, given a sufficiently small neighborhood 
of i|/°; for large p there is a unique solution of the likelihood equation 

fy-pM = 0 in that neighborhood. Furthermore, that solution is a maximum 
likelihood estimate. 

RanaHc: in the proof we make repeated use of the following simple version 
of the strong law of large members (see Chung [3 ]): Let Zj, Z 2 , ■** be 
uncorrelated random variables and suppose the sequence of variances t'arfZj)}"^ 

1 n 

is bounded. Then Z (Z, - E(Z.)) - 0 a.s. as n - . 

n isl 1 l 

Pjypf . th e theorem : Let ^ p (^) * ~~ E D^log 9 r (X r (tf>) . By assumption (i) 

E( X p(<k 0 )) = 0 and by assumption (v) and the strong law, £ p (4» 0 ) -*• 0 a.s. as 
P + Now consider the vxv matrix D* p fo°) whose i jth element is 

1 p ^og q r (* r i<k°) 1 p 1 3 2 q r (^ r l^°) 

p r=l p r=l q r (^ r |^°) 

1 P Slog q r U r l^°) Slog q r U r |*°) 
p r*l 9ij> 1 9ip. 


By assumption (ii> the expected value of the first term on the right is zero. 
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1 p 

Hence* by assumptions (v) and (vl ! D.x (V* ) + — £ J (ij>°) +0a.s. as 

w p p r*l r 

f 

P ®» It follows that with probability I, for each n In 0 < n < — 

2 

there Is a p Q e so that for P ^ P 0 

typ'* 0 ' £ -M 

Without loss of generality we can assirne .ft is convex. 


Thus, for i It e Si , 


1 P 
— £ 
p r=l 


1 p v 

5 £ £ 

p r*l k*l 


3 2 log q r (* r l*) 9 2 log q r (Jf r l*°) 




♦k**S 





1. ,3 


3 Tog q f U r lY° + t(i/> - ip°) ) 
9 ^ 9 ^ 31 ^ 


dt 


' T r»l k>l I ' l ’ k ' I f Ukr (jr r> 


With probability 1, for large p 



* f ljkp<'r> * 1 + y jj KWV 1 < 1 + M ‘ s • 


r*l 


by assumption (1v). 

It follows that for any particular norms on J? v and on the vxv symmetric 
matrices there Is a constant R such that with probability 1 there Is a 
Pj « such that for all p s pj, and ip «e ft , 
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llVp<*) -o/p ( * 0)I1 * *H* - *°ll 
Thus, there Is a convex neighborhood «° of ij>° such that 


Vp<*> s - 111 

for all \f> c ft 0 , p s pj . It now follows as in C 6 J that for p 2 p, 

-Cp is one to one on 8° and that the Image under £ p of the sphere 

at of small radius 5 contains the sphere fl„x(t n (if> 0 )) at £_U°) of 

radius r6 . Sines 0 is eventually In ) * there Is a unique 

solution of tp(^) = 0 in n fi ((j,°). Since 0 ^ p(^) is negative definite, 
this solution is a maximum livelihood estimate. This concludes the proof. 

Theorem 1 shows that by restricting attention to a fixed neighborhood 
of j»° it is possible to speak unambiguously of the unique consistent 
solution of the likelihood equations or, equivalently, of the unique 
consistent MLE of This terminology will be adopted in the next theorem. 


3. Application to Exponential fixtures 

In this section we apply Theorem 1 to a class of mixture models which 

contains the normal mixture model of Section 1. Referring to the notation of 

that section, we assume that conditioned on 0j * i, the random n- vectors 

*jr‘-***jN. are inde P endent with a common density of exponential type 
3 


d) 


f(x!'i ff ) * C(t 0 ) exp <tJF(x)> 


with respect to a dominating a-finlte measure \ where the parameter 
is an arbitrary member of an open subset U of a finite dimensional vector 
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space V with Inner product <•(•>. We assume also that C It one to one 

and that the support of the measure Induced on U by F and X contains 

an open set. These conditions Imply that the parameter Is Identifiable 

C lj, and any parametrlzatlon of the form (1) satisfying them will be called 

a canonical representation of the given family of distributions. 

The joint density, given Gj * i, of Tj * 0^,... Is also of 

j 

exponential type; 1.e*> for Xj N ) 

j 

(2> MM * W exp il » |8 j (x j )> 


where 

W * k ?, F(x jk ) 

and the representation (2) Is also canonical. 

Some useful facts about exponential families are collected In the 
following lemma. For proofs see Barndorff-Nlelsen [i 3. 


Lemma X : Let (i) be a canonical representation of an exponential family. 

For each t € U let k(t) * - In C(t) * In / ex p <t|F(x)> dX(x). Then 

/R° 

(1) for each t c U, F(x) has moments of all orders with respect 
to f(x|x); 

(11) k(t) has derivatives of all orders with respect to t, which 
may be obtained by differentiating under the integral sign. In- 
deed D t <(t) can be represented as a symmetric k- linear form 
on V which Is a polynomial In the first k moments of F. In 
particular,. 
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and 


0") V<T) ■ <E t (F)|*> . /„<F(x)|*> f (x|T j dA(x) 


(1V) D *’ C(t) * C0V t (F) * V F ’ E t (F, I‘ >Z F <*lt) dA(x) , which 1, 
positive definite. 

( v ) <(t) Is strictly convex on U. 

(Expressions <cr|«>* ifir* 

that In (1v) are meant to denote k-llnear forms; 

*•9. <o|-* denotes the bilinear form b(n.v) - <c| n ><o|v>.) 

He are rv w ready to apply Theorem l to the mixture model 
^ *l( x l¥) * » Id>) 

j.j J J' v ' 

where w = '(a „ 

v g 

x s (x 2 .. . . ,x p ) 
m 


<4) q j (x ji*) ■ t E 1 ‘*tPjfeji T t> 

■ p j b j | Vj I « t cpj( x J iV*p J ( x jN)3. 

P J (x j ,T t ) tas ‘he canonical exponential representation given In (2). 

~ SL1: " th * nUnberS { V <" «>e mixture mode, (3) are bounded, then 
* probabn, ‘y 1 there Is a unique consistent MLE of the parameter *°. 

froof: using l*. 1 and writing ^ . ^ (Bj) the nonzero Jertm1 ^ 
of up to order 2 are: 

l5) Vi ( *J W ■ p j< x A> - PjCxj|T m ) . * . 1 m - 1 
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(6) D t qj(Xj|4») « a jPj(*jl T i) <6 j(Xj) ’ y < j( T j t >|- > . I * 

& 

(7) \W> ' p j <x j ,T t> <G j - • * * 1 -— B * 1 

( 8) H^qjfXjl*) = -Pj(Xj|T m ) <Sj - Uj(t„)|-> . t - - 1 

(*) o^j(Xji*) - vj‘*ji T ») t<G j - »A>i * >2 - “\ (G j )} • 

l ~ . 

Instead of verifying conditions (i) and ( i 1 ) of Theorem 1, it is easier 
to recall that they were needed only in order to conclude that the integrals 
of the first and second order derivatives of q.(xj^) are zero at = i|#°. 

J J 

This is obvious form (5) - ( 9 ). Similarly, using Lemma 1 and the boundedness 
of {Nj} the verification of conditions (iii) - (vi) presents no problem 
more serious than tedium. It remains to verify condition (vii). Vie may 
write J r (iJ 0 in matrix form as 

■ 

where I, and I^ are, respectively, the identity operators on 
and V* and 
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C . / M -1 fr r n r t \ 

r \ r <r ' “r <T k ,)<6 r ‘ MV'-y 

|i • i |M« 

We remark that If are distinct then as functions of F < U, 

• <T 1 1 ^> » • • • ie< Tn , I P > , e< T 1 1 F> F , . . . ,e <T ® I F> F are linearly Independent; i.e.. 

If are scalars, A^.^cV and A^e <Xl ^ F> + ... ♦ A^e* 1 "^ 

♦ e <T ^ F> <F|A.> + ... + e <Tw * F> <F|A > « 0 for all F « U, then 


X 1 * “ * * \i * 0 And Aj » 


A^ * 0. It Is easily seen that If 


falls to be positive definite then there Is a nontrivial linear 
combination of these functions which Is zero almost surely with respect 
to the distribution of F. It follows that Is positive definite 

for each r. Condition (vii) will be established once it is shown that 
the smallest eigenvalue of J r (<j») is bounded away from zero as N r -*■ *. 

Let e(A) denote the smallest eigenvalue of a positive definite 
operator A. Clearly, 


o(*J r («J>)) 2 O [ 


(■ 




B„ 


) 


* 


Observe that 


P r <*rlV 


* <(\) - - \\^G r >V 

If * r is a sample from f(x|i k ), then the expression in square brackets 
converges to 
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^(t^) - *(1^) - <t a - Tj c |E t ^(F) > * k(t £ ) - <(t k ) - ic' (x k ) • (x £ - T^) 
which is >0 by the strict convexity of tc. Hence. 


p r (J W 


-►0 as N -► » . 
r 


Therefore, 


Pr^r' T t)Pr( Jf rl T k ) 

q r U r l*)* 


T k q r (^ r l*) 


converges to 0 if If k and 


if l - k as N r -*• Thus, 


V ] ?(~T + \ as N r 

\“» “k / 


Given that X r is from f(x|x k ), N*^ (G r - u r (T k )) converges in distribution 

to a normal random variable 2 with mean zero and covariance cov (F). Hence 

T k 

P r (*K) , 

- r - 1 N 2 (G. - uJtJ) 

\<*>> r “ 

converges in distribution to 0 if 8, f k and -^—Z if l - k. 

°k 

Let A be any element of V and consider 


cm;* 5 <G p - M r (t k )|A>l 4 - N' 2 t E r <F(jf rj ) - E t (F)|a>] 4 

j- 1 k 


After expanding and taking expectation with respect to it will be 
seen that the only nonvanishing terms are those of the form 


V" FU rj> ' (F)|A> 2 <F(* r)l ) - E (F)|A> 2 ] 
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of Mhlch there ere N r ♦ • 0 (hJ). Thus 


E T [N** 5 <G r - u r (r k )|A>] 4 


Is bounded as H. 


> ». It follows from a standard theorem on convergence 
of moments t 3 ,p. 95] that 


P-0r r k) .l 

L - r r 1 If* (G, - 

k Lq r tJr r i ¥ ) 


W r (T k )) 

Thus E^(B r ) 0. Similar reasoning shows that 


-*•0 as H r «. 


E » (C r ) - (S U C0 \ (F » 

as N -*• ». Therefore <r(J <xp) ) is bounded away from 0 and this concludes 
r r, 

the proof. 


4. Concluding Remarks. 

Clearly the assumption in Theorem 2 that {N f } is bounded can be 
weakened. In fact .theorem 1 could be modified in such a way as to show that 
the MLE of exponential mixture parameters is strongly consistent when 

l N r/r 2 < 

Redner C 7 3 has shown that when each N p ■ 1, a certain numerical 
procedure for obtaining the MLE of exponential mixture parameters is con- 
vergent. The generalization to bounded {N r > should not be difficult, and 
will be addressed in a future report. 
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